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Abstract: High resolution sateUite image sequences are multidimensional signals composed of spatio- 
temporal patterns associated to numerous and various phenomena. Bayesian methods have been previ- 
ously proposed in [5] to code the information contained in satellite image sequences in a graph represen- 
tation using Bayesian methods. Based on such a representation, this paper further presents a supervised 
learning methodology of semantics associated to spatio-temporal patterns occurring in satellite image 
sequences. It enables the recognition and the probabilistic retrieval of similar events. Indeed, graphs are 
attached to statistical models for spatio-temporal processes, which at their turn describe physical changes 
in the observed scene. Therefore, we adjust a parametric model evaluating similarity types between graph 
patterns in order to represent user-specific semantics attached to spatio-temporal phenomena. The learn- 
ing step is performed by the incremental definition of similarity types via user-provided spatio-temporal 
pattern examples attached to positive or/and negative semantics. From these examples, probabilities are 
inferred using a Bayesian network and a Dirichlet model. This enables to links user interest to a specific 
similarity model between graph patterns. According to the current state of learning, semantic posterior 
probabilities are updated for all possible graph patterns so that similar spatio-temporal phenomena can 
be recognized and retrieved from the image sequence. Few experiments performed on a multi-spectral 
SPOT image sequence illustrate the proposed spatio-temporal recognition method. 

Key-words: Pattern recognition; supervised learning, spatio-temporal phenomena, graph similarity; 
bayesian networks; Dirichlet model 
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Apprentissage supervise sur des graphes de similarite 
spatio-temporelle dans les sequences d'images satellites 



Resume : Les sequences d'images satellites de haute resolution sont des signaux multidimcnsion- 
ncls composes de motifs spatio-temporcls assocics a des phcnomcnes nombreux et varies. Des mcthodcs 
bayesiennes ont ete precedemment proposees dans [S] pour coder I'information contenue dans les sequences 
d'image satellitaire sous forme de graphes. Base sur une telle representation, ce papier presente une 
methode d'apprentissage supervise de semantiques associees aux motifs spatio-temporels de ces sequences 
d'images. Cela permet la reconnaissance et la recherche probabiliste de phenomenes similaires. En ef- 
fet, les graphes representent des modeles statistiqucs dc processus spatio-temporcls, qui permcttent de 
decrire des changements physiques observes dans la scene. En consequence, par apprentissage super- 
vise, un modclc parametrique cvaluant les types dc similarite cntrc motifs de graphes est ajuste pour 
representer les semantiques associees a ces phenomenes spatio-temporels. L 'apprentissage est effectue 
par la definition incrementale de types de similarites via des exemples fournis par I'utilsateur de motifs 
associes a des semantiques positives ou/et negatives. A partir de ces exemples, des probabilites sont 
deduites par I'utilisation d'un reseau bayesien et d'un modele de Dirichlet. Ces probabilits permettent 
de relier I'interet de I'utilisateur a un modele de similarite specifique entre motifs de graphe. A chaque 
stade d'apprentissage, les probabilites a posteriori sont actualisces pour I'enscmblc des motifs de graphe 
possibles afin que les phenomenes spatio-temporels puissent etre reconnus et retrouves dans la sequence 
d'image. Quelques experiences efffectuees sur une sequence multi-spectral d'images SPOT illustrent la 
methode de reconnaissance spatio-tcmporellc proposce. 

Mots-cles : Reconnaissance de forme, apprentissage supervise, phenomenes spatio-temporels; similarite 
de graphes; reseaux bayesiens; modele de Dirichlet 
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1 Introduction 

During the last decades, the imaging sateUite sensors have acquired huge quantities of data enabhng 
the elaboration of satellite image sequences. However, our capability to store large volume of data has 
highly exceeded our capability to extract and interpret the relevant information. Therefore, satellite im- 
age sequences information learning systems are needed to bridge the semantic gap between information 
extracted from temporal and pictural multidimensional data, and user-specific interests. Indeed, satellite 
image sequences are complex objects possessing a rich information content. They contain numerous and 
various spatio-temporal structures. For example in rural scenes, one can observe the growth and the 
maturation of cultures, their harvests, evolutions of ploughland, river floods, etc. Near urban areas, car 
and plane occlusions are frequent but there are also evolving constructions, pollution phenomenon, etc. 
Spatio-temporal analyses are useful to understand complex evolutions which concern various domains 
such as agriculture, forest monitoring, ecology, hydrology, urbanization, etc. 

Experiments presented in this paper were performed using a satellite image sequence composed of SPOT 
multispcctral images containing 2000x3000 pixels. The spatial resolution is 20 meters. The acquired 
scene is a rural area located in the East of Bucharest (Romania) . The acquisition campaign was driven 
in order to provide remote sensing data for the Data Assimilation for Agro-Modeling (ADAM) project. 
The sequence was obtained by daily acquisition and by filtering out images presenting a cloud or a snow 
cover above the project test sites. This selection procedure resulted in 38 images irregularly sampled in 
time, which were acquired over a period of 286 days. The images were then made superposable and a 
radiative transfer model was applied to produce reflectance measurements. The ADAM project satellite 
image sequence is available on-line [T5] . 

To exploit satellite image sequence information content, in previous work an information flow between 
satellite image sequences content and user interest has been established by modeling hierarchically the in- 
formation content in satellite image sequences [S] . On the first levels of the hierarchical modeling, strong 
families of models are applied to extract information using inference based on Bayesian and entropic 
methods. This unsupervised modeling results in a graph representation coding the information content 
of satellite image sequences. More precisely, the modeling of the time-evolution of the distribution of 
features extracted at consecutive times from the image sequence has been proposed. The modeling has 
resulted in a set of cluster trajectories, possibly splitting and merging in time, which arc grouped into a 
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Figure 1: Results of a probabilistic search of spatio-temporal patterns possessing plowing semantics retrieved in 
space (red class) and time (period written under the 3 image sequences) within a satellite image sequence. 



graph g. 



Based on this objective graphical signal characterization, we focus in this paper on a very important step 
which is providing content-based query techniques : the interaction with the user and the flexible incor- 
poration of user-specific interests. This constitutes the last level of the global hierarchical information 
modeling introduced in [5]. However, Baycsian learning of similarity between graph patterns which is the 
kernel of this last inference level is not presented in the latter article. Therefore, the aim of the present 
paper is to describe this learning methodology employing examples of spatio-temporal processes provided 
on-line by the user. 



The goal of such an supervised learning procedure is the inference of similarity measurements between the 
spatio-temporal processes present in the image sequences, which can then enable the retrieval of phcnom- 
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ena in space and time. Indeed, spatio-temporal processes present in a given time and spatial window of 
the satellite image sequence can possess subjective user-specific semantics (e.g. harvests, wheat harvests 
or crop changes in general). A user may be interested in retrieving similar events and thus, may want 
to know when and where similar spatio-temporal patterns have occurred. An example of probabilistic 
retrieval of spatio-temporal patterns occurring in an image sequence according to a user semantic is given 
in Fig.[T] Moreover, as graph patterns Gk contained in G are stochastic models for these spatio-temporal 
patterns, they can also possess a user semantic. Therefore, we are interested in learning a semantic from 
a user in order to achieve a semantic labeling of graph patterns representing spatio-temporal patterns 
which enables the recognition and the probabilistic retrieval of similar spatio-temporal phenomena. 

Until now, learning methods for satellite image sequences have been dedicated to the analysis and recog- 
nition of particular spatio-temporal phenomena in relation to applications such as change detection [1| , 
data assimilation for agriculture monitoring [TU] or wind field extraction [7]. Although these techniques 
are efficient, together they represent a limited range of applications. Until now, only few methods mainly 
focusing on low resolution images regularly sampled in time [1] [12] [H] have been developed in order to 
adapt to a broader range of application. However, to access to the variety of information contained in 
high resolution satellite image sequences, collaborative and generic methods are needed. 

In this paper we propose an original learning method responding to this problematic. The remainder of the 
paper is organized as follows. After a description of the global supervised semantic modeling procedure, 
we present the parametric model used for evaluating similarity between graph patterns. Then, we propose 
a Bayesian approach for learning the distribution of the similarity parameters based on a Dirichlet model 
and user-provided examples. The learning process yields to the estimation and the semantic labeling 
stages. Finally, after a section describing experimental results, a short summary concludes the discussion. 

2 Bayesian modeling of user semantics 

The inference of the graph ^ is a robust and unsupervised coding of satellite image sequences. Based on 
this objective signal characterization, we focus now on modeling by user-provided examples the seman- 
tics attached to spatio-temporal patterns in satellite image sequences. The proposed supervised learning 
approach is based on Bayesian networks [5] [3] . It aims in extending the learning system proposed in 
to spatio-temporal features. 



RR n° 0123456789 



6 



Heas & Datcu. 



In order to define a model for a given user semantic Au^ we introduce a parametric similarity cost 
S^{Q{i,Qk) between the graph pattern Qk and a reference graph pattern Q{). Dynamic time warping 
schemes [3] constitute efficient approaches for evaluating graph pattern similarities. However, the exten- 
sion of such a distance measurement to multidimensional graph features of heterogeneous nature is not 
obvious. A simple solution has been chosen here to deal with such multidimensional graph patterns. We 
build a parametrical model for similarity by extending the inexact graph matching algorithm proposed 
in [5] . In the introduced model, a parameter vector denoted by $ weights the contribution of each type 
of graph features. This parametrical model will be detailed in section [31 

An intuitive assumption is that a given parameter vector corresponds to a particular similarity, which 
can formalize a given user semantic. Therefore, parameters can be tuned in order to represent a given 
user semantic. We will see in section 21 that parameters $ of the similarity model and the reference 
graph Qq can be estimated via an supervised learning process relying on user-provided examples. It is 
thus possible to link subjective elements Ai, representing user semantics to graph patterns Qk- In this 
perspective, we make the hypothesis that a parametric similarity cost S'$(^/0: Qk) constitutes a model Ai 
which is sufRcient for describing the different semantics. And, introducing a normalization constant Z, 
we define simply the likelihood probability of the semantic for each graph pattern Qk as : 

p{Qk\Au,M) = l , (1) 

where $ and Qq are respectively a parameter vector and a reference graph, both estimated via learning 
with examples. For notation simplification, the conditioning of the likelihood by a model M. is omitted 
in the following. 

Based on these likelihood probabilities, using a Bayesian context enables the estimation of posterior 
probabilities p{Ai, \ Qk) and thus, allows a semantic representation of the satellite image sequences 
content. Indeed, considering that a user provides positive and negative examples, corresponding to a 
positive Ai, and a negative -^Ai, semantic, two likelihood probabilities p{Qk \ A^) and p{Qk \ ~^Au) 
can be derived for each graph patterns. Moreover, graph priors can be obtained using the formula 
p{Qk) ~ ^iPiQk I Ai)p{Ai), where the summation is done over the positive Ai, and negative -lAi, 
semantics. Thus, assuming a uniform prior on the semantics, the posterior probabilities of the positive 
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semantic are inferred using Baycs rule : 

p{Qk I A^)p{A^) 



p{Au I ^fc) = , . 

P\Qk) 

PiQk I A) 



PiGk I I ^A.)' 

Thus, to achieve the posterior estimation, we need to define : (1) a parametric cost 5*^(^/0, Gk) for graph 
pattern similarity to enable the evaluation of likelihood probabilities p{Gk \ A,y) and p{Gk \ ~^A,y), (2) 
a method for learning by examples the model parameters ^ and Go needed for the evaluation of the 
previous likelihood probabilities. These points are detailed in the two next sections. 



3 Parametric model evaluating graph pattern similarity 

The idea of inexact graph matching is to transform one of the graph patterns into the other one by 
assigning a cost to each vertex or edge addition/removal. However, graph patterns Gk arc specific mul- 
tidimensional temporal features which characterize parts of the dynamic cluster trajectories. More pre- 
cisely, they correspond to given classes of a multitemporal classification within a given temporal window. 
The information is condensed in vertices and edges. A vertex is representing a multivariate Gaussian 
distribution related to a given spatial class at a given time. It is characterized by a pixel weight, Gaussian 
parameters and a divergence measurement which has been used for the trajectory reconstruction. An 
edge, representing the evolution of the cluster between two image samples, is characterized by a time 
sampling delay, a pixel flow, Gaussian parameter evolution and multitemporal intra-class changes quan- 
tified by mutual informatior0. Let us denote by {Q} the set of attributes related to a graph patterns. 
Thus, the inexact graph matching algorithm is extended to a parametric distance model between graph 
patterns, weighting the different attribute contributions. 

Denoting by = {ly}} and 1/2 = {i^f} the vertex sets of graph patterns Gi and G21 and denoting 
an extra set of vertices by A = {K}, a mapping function J- = {/} composed by a given combination of 
elementary mapping functions f : ^ v^^ = i/^ U A is defined. A cost C,j,{f{iyl) = I'j^) is assigned to 
each elementary transformations. The cost function depends on the parameter vector <I> = {(t>i} and is 
composed by a weighted sum of similarities between vertices lyj and I'j^ and related edges. The cost is 

'^For more details on the trajectory attributes please refer to [8] 
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equal to 

C*(/(^n - i^f ) = ^0;AKO(i^.^),O(^f )) (3) 

where A;(.) represents a distanee model whieh is either a difference for sealars or a similarity eost between 
probability density functions such as KuUbach-Leibler divergence. The graph patterns similarity is then 
defined, for a given vector parameter (f>, by finding the less expensive elementary mapping function 
combination over all possible mapping functions: 

{Gi , 02) = mm ( ^ {f{vl) = vf)) . (4) 

i 

Denoting by Si{Qi,Q2) the eost related to parameter 0/ in the similarity function S^{Qi,Q2), Eq. [His 
rewritten as 

S^{gu52) = ^(/.,mm(^AKO(i^!),0(i^f))) 

/ i 

= 5; (01, 02). (5) 

I 

In order to estimate the minima, an optimization procedure is performed searching a minimum cost 
path in a tree containing all possible mapping functions configurations. Because, of the combinatorial 
explosion of configurations and real-time requirements, the tree is pruned during the search according to 
the current eost assigned to the branches. This optimization procedure is obviously sub-optimal for dense 
graph patterns with the potential drawback of yielding to local minima. Thus, the pruning approach 
constitutes an easy solution for matching simple graph patterns i.e. with few vertices and edges. However, 
we remark that optimization strategy based for example on graph-cuts [2] should be considered for more 
complex graph patterns. 

4 Learning the similarity model parameters 

In the previous section, we developed a similarity cost function between graph patterns which depends on 
a parameter vector The different components of this vector weight the different contributions related 
to graph attributes (i composing the global similarity cost £'$(.). As it has already been mentioned, 
we make the assumption that a given parameter vector corresponds to a particular similarity which can 
formalize a semantic related to a user. But the manual tuning of the parameters in order to define 
a similarity specific to a semantic may represent a tedious task or even an impossible task for a user. 
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Therefore, a supervised learning procedure is needed to estimate the parameter vector $, enabhng via 
similarity costs, the evaluation of semantic likelihoods p{Qk \ A^) and p{Gk \ ~^Au), which are then used 
for the inference of posterior probabilities p{Ai, \ Gk)- 

We detail in the following how the parameter distribution related to the positive semantic likelihood is 
learned by user-provided examples and how the parameter estimation process is performed. Parameters 
related to the negative semantic likelihood are obtained in a similar framework. Finally, learning result 
in the semantic labeling of the different graph patterns present in the satellite image sequence. 

4.1 Multinomial models for discretized parameter distributions 

The idea for the supervised estimation of the similarity model parameters according to a given semantic 
is the following : we consider a given reference graph pattern Go a-nd an example provided by the user of 
a spatio-temporal phenomenon (i.e. a graph pattern Gk) which possesses a given semantic A^; then, the 
lower the partial cost Si{GoTGk) related to the attribute Q, the more important the weight In other 
words, we make the assumption that the cost function Si{Go-, Gk) related to the attribute Q is proportional 
to the opposite of the parameter value : 

(pi oc ~Si{GQ,Gk)- (6) 

Let us now take advantage of the previous proportionality assumption. First, to allow a comparison be- 
tween the different parameters we normalize the domain where the cost functions Si{Goy Gk) take their 
values. Then, as the estimation of a continuous distribution is difhcult when very little data is available, 
the continuous parameters {</>;} are discretized in r quantization levels, so that each parameter (j)i take 
their values in {(j)}-, and follow a multinomial lawj^. The latter distribution has the advantage of 

possessing parameters linked to occurrence probabilities, which as we will see, can be estimated in real 
time in a Bayesian context. 



Thus, considering the user semantic A,^, the conditioned probability density function is defined for j = 
l,...,r by 

Pici), = ,p] I u,A.) = p{^{Sl{Go,Gk)) = <P\ I Lo.Au) 

= (7) 



■^The number of quantization level r should be sufficiently large in order to approximate a continuous distribution. This 
number r should also be chosen according to the number of examples provided by the user during the learning process. In 
this work, r was fixed to 1000. 
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where uj = {(^2, ■■■T^r} are the parameters of the muhinomial modelfl and A(.) is an operator discretiz- 
mg the normalized interval where the functions Si{QQ^Qk) take their values, in r quantization levels 
{(/);^, For notation simplifications, (0; — (j)] \uj^Ai,) will be noted |ti;,yl^). 

Furthermore, statistical independence is assumed on the parameter conditioned distribution in order to 
avoid the joint probability distribution estimation. Note that this assumption is necessary to reduce the 
model complexity and thus, allow the interactive learning which will be presented in the following. How- 
ever, the validity of such an assumption depends on the nature of graph pattern attributes C/ used in the 
similarity model. For example, pixel flows can be assumed independent from Gaussian parameters. On 
the contrary, mutual information is not necessary independent from pixel flows. Nevertheless, assuming 
the latter assumption valid, we obtain : 

p($ I A) = P(0i I A,)p{4>2 I Au) ... (8) 
4.2 Supervised learning of multinomial distributions 

For a given semantic A,^, we now move the discussion from assessing the probability p{(f>i \ Au) of each 
parameter (pi, to assessing the probability distribution p[ijj | of parameters lo attached to the multino- 
mial model, where ^ denotes a given level of knowledge. 

Supervised learning proposed in this section is inspired of previous work on learning with Bayesian net- 
works [9]|llj. Learning is performed via training the system by a user. A Bayesian framework is adopted 
because of its robustness when very limited user examples are available. The user provide a training 
dataset T of graph patterns examples in accordance to his semantic. With those user-provided examples, 
we define for each parameter 0/, a vector = {A^i, A^^} with Nj being the number of instance of 
(jPi, that is the number of times that (pi = cpj occurs in examples T. Note that parameters uj of the 
multinomial distribution correspond to occurrence probabilities. 

For the supervised evaluation of the occurrence probabilities (or the multinomial model parameters), we 
introduce the Dirichlet distribution as a conjugate prior. For a given level of knowledge ^, this distribution 

^Note that parameter oJi is given by 1 — J^j=2 '^j 
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depends on a vector of hyper-parameters a — {ai, 0;^} and is expressed by 

p{uj I ^) = Dir{uj I ai, ...,0;^) 
r(a) 



where a = '^j "^i > 0;^J [l,r] and where r(a;) denotes the Gamma function. 

The learning of a multinomial distribution (Eq. [7|) uses for initialization, the Dirichlet function with all 
hyper-parameters aj equal to one which represents a uniform probability density function. The prior 
Dirichlet function is 

p{u;)^Dir{Lu \ a^''^ 4°)); Vj £ [l,r],af^ = 1. (10) 

After observing the instances {Nj^'^} in a training dataset T^^^ according to Baycs rule, the posterior 
probability is 

= i?zr(t^|al°)+ivf\...,a(°)+iV(i)) 

After observing another training dataset T^^\ which is assumed to be independent from T*^^' we obtain 
the new posterior 

= 1^'"' (12) 

= Dir{Lu I +7VP,...,aW +A^i2)) 

where the new hyper-parameters were calculated by adding the number of times occurred in the train- 
ing data set T^^^. Therefore, each observed set of data T*^*' can be incorporated as an update of the 
hyper-parameters : a^''' = aj* H-TVj'-*. 

Considering some training T with the associated hyper-parameter vector a, the estimation of p{(j)j \ Ai,, T) 
is achieved using the Minimum Mean Square Error (MMSE) estimator of the parameter uij : 

p{(t)i = I A^) = E[ujj] = J ujjp{uj I T)du) 

= ^- (13) 
a 

Finally, by using the independence assumption, we obtain p($ | Au) by making the product rii?'(<^i I Ai,). 
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4.3 Estimation and update of the similarity model parameters 

After some training T, one can use the MMSE estimator to evaluate the parameter vector <!> of the 
similarity function. It is defined by 

'^MMSE = E[^, (14) 

where E[] is the expectation operator related to the probability distribution p(<I> | A^)- We note that 
the multinomial distribution does not show a clear maximum because of the too few examples provided 
by the user compared to the large number r of values 0^ Z — 1, r. This justifies the use of the MMSE 
estimator rather than the maximum a posteriori estimator. 

Using this parameter vector update, we perform a new evaluation of the similarity function. Therefore, 
according to Eq. [U a semantic likelihood probability can be assigned to each graph pattern Qk with this 
new estimate : 

p{Gu I A,.) = 1 - /"^'^^ (15) 

We choose to use a uniform distribution to initialize the parameter vector distribution. 

Note that the latter probabilities are dependent of an estimated reference graph C?o- Using Eq.[2l enables 
the evaluation of likelihood probabilities p{Qk \ Au) for each graph pattern Qk- We thus obtain a new 
estimate for the reference graph pattern by selecting the one which maximizes the likelihood probability 

§a = argmencpiGk \ A„). (16) 

The first example provided by the user determines the initial reference graph pattern. It is then updated 
according to the previous equation after each learning iteration. The reference graph pattern related to 
the negative semantic is initialized and updated similarly. 

4.4 Semantic labeling of graph patterns 

In the previous sections, the learning of the positive and negative semantic likelihood probabilities has 
been detailed. Using the Bayesian semantic modeling of Eq. [5] yields to the update of posterior probabil- 
ities p{Ai, I Gk) for each graph pattern Qk after each examples of spatio-temporal phenomena provided 
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Figure 2: Iterations for learning field maturation semantics. Above : for visualization pur-poses, the image 
sequence has been subsampled temporally, and only some of the 38 images are here displayed; two pattern examples 
related to a positive semantic (yellow arrows) and two other examples related to a negative semantic (black arrows) 
are successively introduced; the arrows designing these pattern examples are represented with arrows indicating 
fields in the apogee of their maturation process within a temporal window of 12 time samples. Bellow : collections 
of spatio-temporal patterns possessing the highest posterior probabilities P retrieved after each example provided by 
the user; each line represents the current collection of spatio-temporal patterns retrieved which possess the highest 
probabilities; these patterns are defined by spatial classes displayed in red and by the temporal windows indicated 
at the center of the classes. 



by the user. An example showing the successive probabihty updates resulting from supervised learn- 
ing of field maturation semantics is presented for sake of clarity in figure [5] For the visual inspection 
of graph morphological similarities which have been learned after training a cloud occlusion semantic, 
in figure [3] we have plotted graph patterns possessing high posterior probabilities with their associated 
spatio-temporal phenomena. Note that very few clouds are remaining in the image sequence as image 
with large cloud coverage were previously filtered out. In the two latter examples, the features were the 
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Figure 3: Retrieval of clouds based on morphological similarities of graph patterns. Each column rep- 
resents a spatio-temporal phenomenon which has been probabilistically labeled with a cloud occlusion 
semantic. The phenomena are represented above in the form of graph patterns, that is to say projections 
of parts of cluster trajectories (temporal window of 3 time samples) living in the 3D spectral feature space 
(Red-Green-Blue). In the middle, for each column, the corresponding spatial class where the phenomenon 
occurred is displayed in red with its posterior probability P. Bellow, for each column, 3 time samples 
of the image sequence comprising the corresponding spatio-temporal pattern (i.e. cloud occlusion) is 
displayed. 



3 spectral reflectances extracted out of the image sequence in a spatial subset of 200x200 pixels. 
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Wc rely on those posterior probabilities to attach to the graph patterns, i.e. to the spatio-temporal 
phenomena, semantic labels. We consider that a phenomena possesses a semantic label Ai, if the posterior 
probability exceeds a false alarm threshold chosen by the user. 

5 Experiments 
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Figure 4: Supervised learning of field maturation semantics : most likely spatio-temporal structures 
retrieved in a spatial window of 200x200 pixels and ranked, from top to bottom, according to their 
posterior probabilities. Each row presents a retrieved spatial class (left) with its associated time-period, 
which is given by time locations in the first and last images of the row. The middle images in each row 
correspond to the time sample (within the temporal window of 12 samples) where maturations reached 
their apogees. 
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In the experiments carried out, we first focused on spectral features extracted out of the image se- 
quence in a spatial subset of 200x200 pixels. We trained maturation semantics, specific to a field. As 
these phenomena occurred over a long time period, a time window of 12 samples was selected for training. 
With very few positive and negative examples, the supervised learning process enabled the retrieval of 
similar events with high posterior probabilities. The retrieved spatio-temporal structures are presented 
in figure H] together with 3 significant image time samples. Note that, the crop evolutions of highest 
probabilities are maturation phenomena corresponding to the specific sought culture, whereas retrieved 
events with lower probabilities correspond to maturation of similar but slightly different cultures. 

Experiments were then performed with spectral features extracted out of the image sequence in a spatial 
subset of 800x800 pixels. A search was launch to identify crops undergoing similar farming practices 
within 286 days of observations, that is to say the whole image sequence. We particularly focused on the 
wheat annual farming cycle : in autumn, crops are plowed and then sowed with wheat; the crop vegetates 
during winter and in spring the plants grow up to maturation; at the end of summer the wheat is finally 
harvested. We also identified pea farming : the evolution is characterized by the development of leaves 
and ramifications in spring, a fiowering in the beginning of June and a harvest in August. Therefore, a 
single example of a crop of wheat or peas undergoing such a farming process was provided to the system 
for training. 

Results arc displayed in figure [H In order to understand why the greatest posterior probabilities have 
been attached to those structures, a careful inspection of the image sequence was performed. This labo- 
rious task enabled us for example to identify similar crops which have not been retrieved because of an 
early harvest. Let us remark that the repartition of the classes is quite sparse. Therefore, this example 
demonstrates the capacity of the proposed learning approach to recognize complexes phenomena, spread 
in space and undergoing similar changes in time. Note that achieving a similar task by visual inspection 
would have been considerably time-consuming. 

Let us also mention the limitations induced by the graph matching optimization algorithm which has 
been used in the latter experiments. Selecting a limited spatial window (first experiments) or defining 
searched phenomena within a maximum temporal window (last experiment) reduces considerably the 
number of graph pattern Qk contained in the whole graph Q. Moreover, in the previous experiments 
spatio-temporal phenomena have been coded with simple graph patternfl Therefore, the calculation of 
graph likelihoods has been performed in real time and posterior probabilities have appeared to be relevant 

*For more details on tuning graph pattern complexity, please refer to [8] 
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Figure 5: Recognition of particular farming practices. Crops of wheat or peas related to particular farming 
practices were retrieved by supervised learning in a spatial window of 800x800 pixels. Similar spatio-temporal 
structures defined within a maximum temporal window constituted by 38 time samples of the sequence where 
retrieved m space. A single example of evolution related to a crop of wheat or peas enabled the recognition of fields 
undergoing similar farming practices (same harvest period, plowing, etc). Above : for both farming practices 
(wheat or pea), retrieved multitemporal classes are displayed with shaded colors (red or green) according to their 
posterior probabilities p appearing in the caption on the right. Bellow : for visualization purposes, the image 
sequence is represented here by the first image (on the left), the last image (on the right) and a single intermediate 
image (in the middle). 

of the different user semantics. Nevertheless, learning semantics attached to numerous spatio-temporal 
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phenomena coded with too dense graph patterns may be time-consuming and not respect real time 
requirements. Indeed, the combinatory explosion problem for matching vertices and edges is accentuated 
for dense graph patterns. Thus, our simple optimization algorithm may not reveal sufficiently accurate 
parameter minima and result in a weak learning. Thus, implementing a better optimization algorithm 
based for example on graph cuts is required for further experiments. 

6 Conclusion and perspectives 

This work is an attempt to solve the complex problem of recognizing various spatio-temporal phenomena 
in satellite image sequences. The proposed concept, developed in a Bayesian framework, models a user 
semantic by a parametric model evaluating the similarities of graph patterns. The latter code spatio- 
temporal phenomena. Discretized parameter distributions related to the similarity model are learned in 
a supervised way by updates of the parameters of multinomial models. The learning process is based on 
a Dirichlet model and user-provided examples related to positive and negative semantics. 
Based on results on SPOT image sequence, the method appears to be a fast and relevant way to retrieve 
user-specific spatio-temporal patterns. The experiments have also revealed that the optimization algo- 
rithm used for evaluating graph pattern similarity constitutes a crucial issue for further developing the 
learning capabilities. 

We believe that the learning concept we have presented constitutes a valuable tool in view of the nu- 
merous potential applications. Collecting ground truth data or available expert knowledge related to 
agriculture or other applications will be the next step towards the exhaustive assessment of the proposed 
spatio-temporal recognition approach. Moreover, such a supervised learning method can apply on mul- 
tidimensional graph coding any data. Using this approach in other fields, such as molecular biology or 
telecommunication networks for the recognition of particular graph patterns, constitute a very interesting 
perspective. 
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